-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation Overhaul for meds-tab #96
Conversation
added autogluon support, more models, more preprocessing strategies
WalkthroughThis pull request includes updates to several documentation files and configuration settings for the MEDS-Tab library. Key changes involve upgrading the version of pre-commit hooks, restructuring the README documentation for clarity, and introducing new scripts for generating reference pages. Additionally, some files related to documentation generation have been deleted, while new files and sections have been added to improve user guidance and project structure. Changes
Possibly related PRs
Suggested reviewers
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## dev #96 +/- ##
=======================================
Coverage 81.91% 81.91%
=======================================
Files 20 20
Lines 1255 1255
=======================================
Hits 1028 1028
Misses 227 227 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Outside diff range and nitpick comments (13)
mkdocs.yml (1)
1-5
: Consider enhancing the site description.The current description "Documentation for the meds-tab package" could be more informative. Consider expanding it to briefly mention the package's purpose (Tabular AutoML).
-site_description: Documentation for the meds-tab package +site_description: Documentation for meds-tab, a Tabular AutoML package for medical datapyproject.toml (1)
Line range hint
18-21
: Consider pinning versions for all dependencies.While many dependencies have specific versions, some core dependencies (
pyarrow
,loguru
,numpy
,pandas
,tqdm
,xgboost
,scikit-learn
) don't have version constraints. This could lead to compatibility issues if these packages release breaking changes.Consider adding version constraints for these packages to ensure reproducible builds.
docs/implementation.md (1)
95-95
: LGTM! Clear label requirements with room for enhancement.The added requirements clearly specify the necessary columns for binary classification tasks.
Consider adding a code snippet showing the expected label schema format for better clarity. For example:
# Example label schema { "subject_id": str, "prediction_time": datetime, "boolean_value": bool }README.md (1)
55-55
: Use proper heading syntax for installation sections.Instead of using bold text, use proper markdown heading syntax for consistency with the rest of the document.
-**Pip Install** +### Pip Install -**Local Install** +### Local InstallAlso applies to: 61-61
🧰 Tools
🪛 Markdownlint
55-55: null
Emphasis used instead of a heading(MD036, no-emphasis-as-heading)
docs/prediction.md (3)
16-22
: Consider documenting the command name change and supported models.The command has been renamed from
meds-tab-xgboost
tomeds-tab-model
, which suggests support for multiple model types. Consider adding a brief explanation of:
- Why the command was renamed
- What other models are supported besides XGBoost
Also applies to: 86-91
26-27
: Add explanation for the min_code_inclusion_count parameter.The
min_code_inclusion_count
parameter appears to be a crucial threshold, but its purpose and impact on model training aren't explained. Consider adding:
- What this parameter controls
- How it affects feature selection
- Guidelines for choosing an appropriate value
141-141
: Enhance the footnote explanation.While the footnote explains how the "Number of Included Codes" relates to the minimum threshold, it would be helpful to add:
- The impact of this relationship on model performance
- Whether these numbers are expected/optimal for each task
docs/overview.md (5)
1-4
: Improve heading structure and formatting.The document structure needs the following improvements:
- Remove trailing colons from headings
- Use consistent heading levels (h1 -> h2 -> h3)
Apply this diff to fix the heading structure:
-# Core CLI Scripts Overview +# Core CLI Scripts Overview + +## Overview We provide a set of core CLI scripts to facilitate the tabularization and modeling of MEDS data. These scripts are designed to be run in sequence to transform raw MEDS data into tabularized data and train a model on the tabularized data. The following is a high-level overview of the core CLI scripts:
23-24
: Add language specifiers to code blocks.Several code blocks are missing language specifiers, which affects syntax highlighting and readability. Add appropriate language specifiers (e.g.,
text
,bash
, etc.) to all code blocks.Also applies to: 59-62, 80-86, 117-119, 123-125, 129-138, 168-170, 174-176, 180-191
🧰 Tools
🪛 Markdownlint
23-23: null
Fenced code blocks should have a language specified(MD040, fenced-code-language)
64-64
: Improve section transitions and formatting.The transitions between sections need improvement:
- Remove trailing periods from headings
- Add consistent spacing between sections
- Use consistent heading format for all CLI commands
Also applies to: 140-142
🧰 Tools
🪛 LanguageTool
[typographical] ~64-~64: The word “thus” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...o a uniquesubject_id
andtimestamp
combination, thus rows are duplicated across multiple tim...(THUS_SENTENCE)
🪛 Markdownlint
64-64: Punctuation: '.'
Trailing punctuation in heading(MD026, no-trailing-punctuation)
89-91
: Add explanations for environment variables.The code examples use environment variables (e.g.,
$MEDS_RESHARD_DIR
,$OUTPUT_TABULARIZATION_DIR
,$TASK
) without explaining their expected values. Consider:
- Adding a section explaining all required environment variables
- Providing example values for these variables
- Including a minimal working example with literal values
Also applies to: 155-164, 198-205, 222-231
7-7
: Improve language and grammar.Consider the following language improvements:
- Line 7: Rephrase "We shard the data" to "The data is sharded"
- Lines 55, 93: Replace "very quickly" with "efficiently" or "rapidly"
- Line 144: Fix agreement in "labels that has pulled" to "labels pulled"
Also applies to: 55-55, 93-93, 144-144
🧰 Tools
🪛 LanguageTool
[grammar] ~7-~7: Check that the noun ‘shard’ after the pronoun ‘We’ is correct. It’s possible that you may need to switch to a possessive pronoun, or use another part of speech.
Context: ...igh memory usage and slow compute time. We shard the data into small shards to reduce th...(PRP_VB)
docs/gen_ref_pages.py (1)
15-19
: Consider adding logging for better traceabilityIn the loop starting at line 15, adding logging statements can help in tracing which files are being processed. This can be beneficial for debugging and confirming that all intended files are included.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (3)
docs/assets/dark_purple_meds_tab.png
is excluded by!**/*.png
docs/assets/light_logo.png
is excluded by!**/*.png
docs/assets/light_purple_meds_tab.png
is excluded by!**/*.png
📒 Files selected for processing (22)
- .pre-commit-config.yaml (1 hunks)
- README.md (2 hunks)
- docs/Makefile (0 hunks)
- docs/README.MD (1 hunks)
- docs/gen_ref_pages.py (1 hunks)
- docs/generate.sh (0 hunks)
- docs/implementation.md (3 hunks)
- docs/index.md (1 hunks)
- docs/javascripts/mathjax.js (1 hunks)
- docs/make.bat (0 hunks)
- docs/overrides/main.html (1 hunks)
- docs/overview.md (1 hunks)
- docs/prediction.md (5 hunks)
- docs/requirements.txt (0 hunks)
- docs/source/_static/switcher.json (0 hunks)
- docs/source/conf.py (0 hunks)
- docs/source/index.md (0 hunks)
- docs/source/overview.md (0 hunks)
- docs/terminology.md (1 hunks)
- docs/tutorial.md (1 hunks)
- mkdocs.yml (1 hunks)
- pyproject.toml (1 hunks)
💤 Files with no reviewable changes (8)
- docs/Makefile
- docs/generate.sh
- docs/make.bat
- docs/requirements.txt
- docs/source/_static/switcher.json
- docs/source/conf.py
- docs/source/index.md
- docs/source/overview.md
✅ Files skipped from review due to trivial changes (4)
- docs/index.md
- docs/overrides/main.html
- docs/terminology.md
- docs/tutorial.md
🧰 Additional context used
🪛 LanguageTool
README.md
[style] ~40-~40: Opting for a less wordy alternative here can improve the clarity of your writing.
Context: ...n the MEDS-Tab ecosystem. This approach not only simplifies the process but also ensures high-quality, reproducible results for ...(NOT_ONLY_ALSO)
docs/overview.md
[grammar] ~7-~7: Check that the noun ‘shard’ after the pronoun ‘We’ is correct. It’s possible that you may need to switch to a possessive pronoun, or use another part of speech.
Context: ...igh memory usage and slow compute time. We shard the data into small shards to reduce th...(PRP_VB)
[style] ~55-~55: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...is stage is not parallelized as it runs very quickly. ??? note "Args Description" ``` - `i...(EN_WEAK_ADJECTIVE)
[typographical] ~64-~64: The word “thus” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...o a uniquesubject_id
andtimestamp
combination, thus rows are duplicated across multiple tim...(THUS_SENTENCE)
[style] ~93-~93: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...is stage is not parallelized as it runs very quickly. ??? note "Args Description" - `input_...(EN_WEAK_ADJECTIVE)
[uncategorized] ~140-~140: Loose punctuation mark.
Context: ...value. ``` 5.meds-tab-cache-task
: Aligns task-specific labels with the n...(UNLIKELY_OPENING_PUNCTUATION)
[grammar] ~144-~144: Possible subject-verb agreement error detected.
Context: ...a specific task$TASK
and labels that has pulled from [ACES](https://github.com/j...(PLURAL_THAT_AGREEMENT)
[style] ~271-~271: Using many exclamation marks might seem excessive (in this case: 18 exclamation marks for a text that’s 6316 characters long)
Context: ...es, during hyperparameter tuning! ```` !!! note "Code Inclusion Parameters" ``` I...(EN_EXCESSIVE_EXCLAMATION)
🪛 Markdownlint
README.md
55-55: null
Emphasis used instead of a heading(MD036, no-emphasis-as-heading)
61-61: null
Emphasis used instead of a heading(MD036, no-emphasis-as-heading)
docs/README.MD
15-15: null
Fenced code blocks should have a language specified(MD040, fenced-code-language)
docs/overview.md
5-5: Expected: h2; Actual: h4
Heading levels should only increment by one level at a time(MD001, heading-increment)
5-5: Punctuation: ':'
Trailing punctuation in heading(MD026, no-trailing-punctuation)
36-36: Punctuation: ':'
Trailing punctuation in heading(MD026, no-trailing-punctuation)
64-64: Punctuation: '.'
Trailing punctuation in heading(MD026, no-trailing-punctuation)
98-98: Punctuation: ':'
Trailing punctuation in heading(MD026, no-trailing-punctuation)
193-193: Punctuation: ':'
Trailing punctuation in heading(MD026, no-trailing-punctuation)
23-23: null
Fenced code blocks should have a language specified(MD040, fenced-code-language)
59-59: null
Fenced code blocks should have a language specified(MD040, fenced-code-language)
80-80: null
Fenced code blocks should have a language specified(MD040, fenced-code-language)
117-117: null
Fenced code blocks should have a language specified(MD040, fenced-code-language)
123-123: null
Fenced code blocks should have a language specified(MD040, fenced-code-language)
129-129: null
Fenced code blocks should have a language specified(MD040, fenced-code-language)
168-168: null
Fenced code blocks should have a language specified(MD040, fenced-code-language)
174-174: null
Fenced code blocks should have a language specified(MD040, fenced-code-language)
180-180: null
Fenced code blocks should have a language specified(MD040, fenced-code-language)
249-249: null
Fenced code blocks should have a language specified(MD040, fenced-code-language)
274-274: null
Fenced code blocks should have a language specified(MD040, fenced-code-language)
284-284: null
Fenced code blocks should have a language specified(MD040, fenced-code-language)
🔇 Additional comments (19)
docs/javascripts/mathjax.js (2)
1-12
:⚠️ Potential issueReview and verify MathJax configuration patterns.
A few concerns with the current configuration:
- The double backslashes in the delimiters might be excessive. Single backslashes are typically sufficient for MathJax.
- The
ignoreHtmlClass
regex pattern ".*|" looks suspicious and might not work as intended.- Consider adding error handling for MathJax initialization.
Let's verify if these patterns are consistent with MkDocs best practices:
Consider applying these fixes:
window.MathJax = { tex: { - inlineMath: [["\\(", "\\)"]], - displayMath: [["\\[", "\\]"]], + inlineMath: [["\\(", "\\)"]], + displayMath: [["\\[", "\\]"]], processEscapes: true, processEnvironments: true }, options: { - ignoreHtmlClass: ".*|", + ignoreHtmlClass: ".*", processHtmlClass: "arithmatex" } };Also, consider adding basic error handling:
+if (typeof window.MathJax === 'undefined') { + console.error('MathJax not loaded properly'); +} window.MathJax = { // ... configuration ... };
14-19
:⚠️ Potential issueAdd error handling and optimize MathJax operations.
The current implementation lacks error handling for the promise chain and might have performance implications.
Consider applying these improvements:
-document$.subscribe(() => { +document$.subscribe(async () => { + try { MathJax.startup.output.clearCache() MathJax.typesetClear() MathJax.texReset() - MathJax.typesetPromise() + await MathJax.typesetPromise() + } catch (error) { + console.error('Failed to process MathJax:', error); + } })Let's verify the document$ observable source:
mkdocs.yml (4)
52-66
: LGTM! Comprehensive markdown extension setup.The markdown extensions are well-configured for technical documentation, including support for math equations, code highlighting, and advanced formatting features.
71-81
: Verify reference page generator script.The configuration references a custom script for generating reference pages that needs to be verified.
#!/bin/bash # Description: Check if reference page generator exists echo "Checking for reference page generator..." if ! fd -t f "gen_ref_pages.py" docs/; then echo "Warning: Reference page generator script not found" fi
17-50
: Verify the existence of theme assets.The theme configuration references custom assets that need to be verified.
✅ Verification successful
Based on the shell script results, I can generate the final response as we have verified both requirements:
Theme assets verification completed successfully
The verification confirms that:
- The logo/favicon file
light_logo.png
exists at the correct pathdocs/assets/
- The custom overrides directory
docs/overrides
existsNote: I chose the tag because both the custom assets file and overrides directory were found at their expected locations, confirming that the theme configuration references valid paths.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check if referenced theme assets exist echo "Checking for theme assets..." # Check for logo and favicon fd -t f "light_logo.png" docs/assets/ # Check for custom overrides directory if ! [ -d "docs/overrides" ]; then echo "Warning: Custom overrides directory not found" fiLength of output: 159
67-70
: Verify MathJax configuration file.The configuration references a custom MathJax configuration file that needs to be verified.
✅ Verification successful
Let me check the content of the found MathJax configuration file to ensure it's properly configured.
MathJax configuration is properly set up
The MathJax configuration file exists and contains valid configuration:
- Correctly sets up inline and display math delimiters
- Properly configures processing escapes and environments
- Includes necessary MathJax startup and typeset handling
- Integrates with the document lifecycle using the subscription pattern
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check if MathJax configuration exists echo "Checking for MathJax configuration..." if ! fd -t f "mathjax.js" docs/javascripts/; then echo "Warning: MathJax configuration file not found" fiLength of output: 245
Script:
#!/bin/bash # Check the content of the MathJax configuration file cat docs/javascripts/mathjax.jsLength of output: 417
pyproject.toml (2)
36-36
: LGTM: Version constraint for pre-commit is appropriate.The version constraint
pre-commit<4
aligns with the PR objective to address the docformatter pre-commit hook compatibility issue.
40-53
: Consider security implications of git plugins.The MkDocs dependencies are well-structured with pinned versions. However, please be aware that:
mkdocs-git-authors-plugin
may expose contributor informationmkdocs-git-revision-date-localized-plugin
may expose internal timeline informationConsider documenting any sensitive information these plugins might expose and ensure it aligns with your project's privacy requirements.
#!/bin/bash # Check if there's any documentation about git plugins configuration rg -i "git.*plugin|author.*plugin|revision.*plugin" docs/ mkdocs.yml.pre-commit-config.yaml (1)
Line range hint
1-124
: Configuration structure looks goodThe pre-commit configuration is well-organized with:
- Properly pinned versions for all hooks
- Clear categorization of hooks by purpose
- Appropriate arguments and exclusions for each hook
docs/implementation.md (2)
32-32
: LGTM! Clear explanation of feature counts.The added explanation effectively introduces the concept and its importance in the pipeline.
37-37
: LGTM! Consistent terminology usage.The terminology update from "frequencies" to "counts" maintains consistency with other documentation changes.
README.md (2)
68-72
: LGTM!The section provides clear references to both the tutorial and integration tests, which is helpful for users getting started.
Line range hint
16-49
: Excellent documentation restructuring!The new structure with clear sections for:
- Welcome and value proposition
- Step-by-step guide for getting started
- Technical details and capabilities
makes the documentation much more accessible and user-friendly. The flow from high-level overview to technical details is well-organized.
🧰 Tools
🪛 LanguageTool
[style] ~29-~29: Consider a shorter alternative to avoid wordiness.
Context: ...ed across arbitrary tasks and settings. In order to use MEDS-Tab, you will first need to tr...(IN_ORDER_TO_PREMIUM)
[style] ~40-~40: Opting for a less wordy alternative here can improve the clarity of your writing.
Context: ...n the MEDS-Tab ecosystem. This approach not only simplifies the process but also ensures high-quality, reproducible results for ...(NOT_ONLY_ALSO)
🪛 Markdownlint
55-55: null
Emphasis used instead of a heading(MD036, no-emphasis-as-heading)
61-61: null
Emphasis used instead of a heading(MD036, no-emphasis-as-heading)
docs/prediction.md (1)
127-141
: Verify unusually low performance metrics and extreme parameter variations.There are some concerning patterns in the performance tables:
- The eICU In Hospital Mortality (48hr) shows an unusually low AUC of 0.570
- The Minimum Code Inclusion Count varies extremely (from 43 to 335,912)
Consider:
- Verifying these results, especially the low-performing models
- Adding explanations for the wide variation in code inclusion counts
- Discussing the implications of these variations on model reliability
Run this script to verify the metrics:
Also applies to: 171-180
docs/gen_ref_pages.py (5)
14-52
:process_python_files()
function is well-implementedThe
process_python_files()
function effectively processes Python files to generate API reference pages and navigation. The handling of special cases like__init__.py
and inclusion ofREADME.md
files is correctly implemented.
54-72
:process_yaml_files()
function is well-implementedThe
process_yaml_files()
function successfully processes YAML configuration files to generate documentation and navigation entries. The inclusion of YAML content within code blocks enhances readability.
74-83
: Efficient generation of navigation summariesThe creation of
SUMMARY.md
files for both API and configuration references ensures that the documentation is well-organized and easily navigable.
43-46
: Ensureident
correctly reflects module hierarchyThe construction of
ident
at lines 43-46 determines how the modules are referenced in the documentation. Verify that this accurately represents the module hierarchy, especially for nested packages.You can check the generated documentation to confirm that all modules are correctly documented.
36-39
: Verify the inclusion of appropriateREADME.md
contentIncluding
README.md
files in the documentation is helpful, but ensure that the content is suitable for the API reference and does not contain installation instructions or other unrelated information.You can run the following script to list and review the
README.md
files being included:✅ Verification successful
Let me try a different approach to verify the README.md files that would be included in the documentation.
This will help us:
- Find all README.md files in the repository
- Review their content to assess if they are suitable for API documentation
Additionally, let's look at the documentation generation script to understand the context better:
Based on the provided context, I can now generate a final response. The code and README content have given me enough information to assess the situation.
README.md content is appropriate for API documentation
The repository's README.md content is well-structured and contains relevant technical information that would be valuable in API documentation, including:
- Technical overview of the library's functionality
- Explanation of core concepts (e.g., the detailed explanation of what "tabular pipelines" means)
- Component descriptions and architectural details
- Roadmap and limitations
The documentation generation script (
gen_ref_pages.py
) only includes README.md files that are within Python package directories (next to__init__.py
files), which is a good practice as these typically contain module-specific technical documentation rather than installation or general project information.Note: The main README.md won't be included in the API reference as it's not within the
src/MEDS_tabular_automl
directory, and any module-level README.md files will be appropriately placed in their respective module documentation sections.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: List and display README.md files included in the API documentation # Find all README.md files within src/MEDS_tabular_automl excluding the configs directory find src/MEDS_tabular_automl -path '*/configs/*' -prune -o -name 'README.md' -print -exec cat {} \;Length of output: 102
Script:
#!/bin/bash # First, let's understand where the documentation is being generated from fd README.md --type f # Then, let's examine the content of these README files fd README.md --type f --exec cat {}Length of output: 8905
Script:
#!/bin/bash cat docs/gen_ref_pages.py
Length of output: 2660
nav: | ||
- "Home": index.md | ||
- "Overview": overview.md | ||
- "MIMICIV Tutorial": tutorial.md | ||
- "Terminology": terminology.md | ||
- "Prediction": prediction.md | ||
- "Profiling": profiling.md | ||
- "API Reference": reference/api/ | ||
- "Config Reference": reference/config/ | ||
- "Issues": https://github.com/mmcdermott/MEDS_Tabular_AutoML/issues | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Missing reference directories in documentation
All main documentation files exist, but the reference directories (reference/api/
and reference/config/
) specified in the navigation are missing from the docs folder. This needs to be addressed as it will lead to broken navigation links.
docs/reference/api/
directory is missingdocs/reference/config/
directory is missing
🔗 Analysis chain
Verify the existence of all referenced documentation files.
The navigation structure looks well-organized, but we should verify that all referenced Markdown files exist.
Let me verify the existence of the reference directories as well, since they were part of the navigation structure but weren't checked in the first script.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if all referenced markdown files exist in the docs directory
# List of files to check (extracted from nav section)
files=("index.md" "overview.md" "tutorial.md" "terminology.md" "prediction.md" "profiling.md")
echo "Checking for required documentation files..."
for file in "${files[@]}"; do
if ! fd -t f "^${file}$" docs/; then
echo "Warning: $file not found"
fi
done
Length of output: 663
Script:
#!/bin/bash
# Check if the reference directories exist and contain content
echo "Checking reference directories..."
fd -t d "^reference$" docs/
echo -e "\nChecking contents of reference/api and reference/config..."
fd . "docs/reference/api"
fd . "docs/reference/config"
Length of output: 509
@@ -33,10 +33,24 @@ generate-subsets = "MEDS_tabular_automl.scripts.generate_subsets:main" | |||
|
|||
|
|||
[project.optional-dependencies] | |||
dev = ["pre-commit"] | |||
dev = ["pre-commit<4"] | |||
tests = ["pytest", "pytest-cov", "rootutils"] | |||
profiling = ["mprofile", "matplotlib"] | |||
autogluon = ["autogluon; python_version=='3.11.*'"] # Environment marker to restrict AutoGluon to Python 3.11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Based on the results, I can now generate the final response since we have sufficient information from the documentation and project configuration files.
Python version requirements need to be updated in documentation
The project has inconsistent Python version requirements:
- README.md shows Python 3.12+ support (badge shows "Python_3.12+")
- pyproject.toml has
requires-python = ">=3.11"
- autogluon is restricted to Python 3.11 specifically
Please update the documentation to clearly state:
- The general Python version requirement (>=3.11)
- That the optional autogluon integration specifically requires Python 3.11
🔗 Analysis chain
Verify Python version requirement documentation.
The environment marker for autogluon ensures compatibility with Python 3.11. Please ensure this requirement is clearly documented in the project's installation guide.
Let me try a different approach to search for documentation about Python version requirements.
This script will:
- Show the content of README.md which often contains installation requirements
- Find any markdown/rst files related to installation or setup
- Check pyproject.toml for context around version requirements
The results will help us determine if the Python 3.11 requirement for autogluon is properly documented.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if Python version requirement is documented
rg -i "python.*3\.11|python.*version" README.md docs/
Length of output: 70
Script:
#!/bin/bash
# Check various documentation files and patterns
echo "=== Checking README.md ==="
cat README.md
echo -e "\n=== Checking for installation/setup docs ==="
fd -e md -e rst "install|setup|requirement"
echo -e "\n=== Checking pyproject.toml for other version markers ==="
cat pyproject.toml
Length of output: 11566
@@ -102,4 +102,4 @@ The final stage uses the processed and cached data to train an XGBoost model. Th | |||
|
|||
- **Iterator for Data Loading**: Custom iterators are designed to load sparse matrices efficiently into the XGBoost training process, which can handle sparse inputs natively, thus maintaining high computational efficiency. | |||
- **Training and Validation**: The model is trained using the tabular data, with evaluation steps that include early stopping to prevent overfitting and tuning of hyperparameters based on validation performance. | |||
- **Hyperaparameter Tuning**: We use [optuna](https://optuna.org/) to tune over XGBoost model pramters, aggregations, window sizes, and the minimimum code inclusion frequency. | |||
- **Hyperaparameter Tuning**: We use [optuna](https://optuna.org/) to tune over XGBoost model pramters, aggregations, window sizes, and the minimimum code inclusion count. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typos in hyperparameter tuning description.
There are several spelling errors in this line:
- "pramters" should be "parameters"
- "minimimum" should be "minimum"
Apply this correction:
- We use [optuna](https://optuna.org/) to tune over XGBoost model pramters, aggregations, window sizes, and the minimimum code inclusion count.
+ We use [optuna](https://optuna.org/) to tune over XGBoost model parameters, aggregations, window sizes, and the minimum code inclusion count.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
- **Hyperaparameter Tuning**: We use [optuna](https://optuna.org/) to tune over XGBoost model pramters, aggregations, window sizes, and the minimimum code inclusion count. | |
- **Hyperaparameter Tuning**: We use [optuna](https://optuna.org/) to tune over XGBoost model parameters, aggregations, window sizes, and the minimum code inclusion count. |
- [MEDS ETL](https://github.com/Medical-Event-Data-Standard/meds_etl) for a collection of ETLs from common data formats to MEDS. The package library currently supports MIMIC-IV, OMOP v5, and MEDS FLAT (a flat version of MEDS). | ||
|
||
### II. Run MEDS-Tab | ||
|
||
- Run the MEDS-Tab Command-Line Interface tool (`MEDS-Tab-cli`) to extract cohorts based on your task - check out the [Usage Guide](https://meds-tab--36.org.readthedocs.build/en/36/overview.html#core-cli-scripts-overview)! | ||
- Run the MEDS-Tab Command-Line Interface tool (`MEDS-Tab-cli`) to extract cohorts based on your task - check out the [Usage Guide](https://meds-tab--36.org.readthedocs.build/en/36/overview.html#core-cli-scripts-overview) and the [MIMIC-IV tutorial](https://github.com/mmcdermott/MEDS_Tabular_AutoML/tree/main/MIMICIV_TUTORIAL)! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix documentation link to use the stable version.
The current link points to a specific build version (meds-tab--36.org.readthedocs.build/en/36/
). This should be updated to point to the stable documentation URL.
-check out the [Usage Guide](https://meds-tab--36.org.readthedocs.build/en/36/overview.html#core-cli-scripts-overview)
+check out the [Usage Guide](https://meds-tab.readthedocs.io/en/stable/overview.html#core-cli-scripts-overview)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
- Run the MEDS-Tab Command-Line Interface tool (`MEDS-Tab-cli`) to extract cohorts based on your task - check out the [Usage Guide](https://meds-tab--36.org.readthedocs.build/en/36/overview.html#core-cli-scripts-overview) and the [MIMIC-IV tutorial](https://github.com/mmcdermott/MEDS_Tabular_AutoML/tree/main/MIMICIV_TUTORIAL)! | |
- Run the MEDS-Tab Command-Line Interface tool (`MEDS-Tab-cli`) to extract cohorts based on your task - check out the [Usage Guide](https://meds-tab.readthedocs.io/en/stable/overview.html#core-cli-scripts-overview) and the [MIMIC-IV tutorial](https://github.com/mmcdermott/MEDS_Tabular_AutoML/tree/main/MIMICIV_TUTORIAL)! |
with mkdocs_gen_files.open(full_doc_path, "w") as fd: | ||
fd.write(f"# {path.stem}\n\n") | ||
fd.write("```yaml\n") | ||
fd.write(path.read_text()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle potential file read errors when processing YAML files
When reading YAML files at line 68, consider adding error handling to manage situations where a file might not be read successfully due to permissions or other I/O issues.
You can modify the code to include a try-except block:
fd.write("```yaml\n")
- fd.write(path.read_text())
+ try:
+ fd.write(path.read_text())
+ except Exception as e:
+ fd.write(f"# Error reading file {path.name}: {e}")
fd.write("\n```\n")
|
||
def process_yaml_files(): | ||
config_dir = src / "configs" | ||
for path in sorted(config_dir.rglob("*.yaml")) + sorted(config_dir.rglob("*.yml")): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Simplify YAML file discovery
At line 56, you're combining two sorted lists to find YAML files. Consider using a single rglob
with a pattern that matches both .yaml
and .yml
extensions to simplify the code.
You can modify the code as follows:
- for path in sorted(config_dir.rglob("*.yaml")) + sorted(config_dir.rglob("*.yml")):
+ for path in sorted(config_dir.rglob("*.y*ml")):
Committable suggestion was skipped due to low confidence.
…is incompatible with material them markdown admonitions, and reverted removal of admonitions
The |
Changes
Summary by CodeRabbit
Release Notes
New Features
Documentation Improvements
Bug Fixes
Chores